Speech synthesis in dialogue systems
نویسندگان
چکیده
This paper argues that dialogue synthesis has special requirements. Users expect a high level of naturalness. We argue for good segmental rendering with the addition of pragmatic effects, such as emotion and attitude. We discuss the language model used, emphasising the phonological and phonetic levels essential to handle these effects. We draw on the theories of Cognitive Phonetics and Pragmatic Phonetics. INTRODUCTION Dialogue involves a human user and a computer interacting to exchange information. A telephone inquiry system is an example of dialogue based interaction for requesting and providing information. Dialogue systems incorporate several identifiable elements including a dialogue controller [1] and input/output subsystems taking the form of automatic speech recognition and speech synthesis. In this paper we address a number of questions determining how to provide a synthesis module to meet the demands of dialogue. In particular we consider general prerequisites of synthesis, special requirements for synthesis in dialogue systems, modelling the requirements in the dialogue context, implementation of the model. We make the assumption here that other components of the overall system have determined such matters as what is to be spoken, how it is to be spoken. However, the implementation of the output will determine the formulation of these other components of the system. GENERAL PREREQUISITES To ensure user acceptability in a wide range of environments the speech produced by a synthesiser must be of high quality. Almost all synthesisers now produce highly intelligible output, and listeners have little difficulty in determining the intended message. Although there are no universally accepted formal evaluation procedures for synthetic speech, there is little disagreement that generally the threshold of good intelligibility has been passed. Much of the achievement is due to improvements in the way the sound wave itself is generated. To address the question of general requirements satisfactorily we need to make a distinction between low level synthesis and high level synthesis.
منابع مشابه
Modeling Lateral Communication in Holonic Multi Agent Systems
Agents, in a multi agent system, communicate with each other through the process of exchanging messages which is called dialogue. Multi agent organization is generally used to optimize agents’ communications. Holonic organization demonstrates a self-similar recursive and hierarchical structure in which each holon may include some other holons. In a holonic system, lateral communication occurs b...
متن کاملA Phonetic Adaptation Module for Spoken Dialogue Systems
This paper presents a novel component for spoken dialogue systems, which adds the functionality of adapting the system’s speech output based on the user’s input. The adaptation in done on the phonetic level for adopting the user’s speech characteristics without changing the system’s own voice. An architecture for a spoken dialogue system is introduced, in which this module creates a direct link...
متن کاملSpoken dialogue systems
During the past decade there has been substantial improvements in natural language processing, speech recognition and text-to-speech conversion. Spoken dialogue is the missing component in between of these, making it possible to develop conversational interfaces to computers. In this paper I introduce the basics of spoken dialogue systems and present some applications using these techniques.
متن کاملIncremental speech synthesis
Human interaction with spoken dialogue systems differ in many ways from their interactions with each other. One notable example is that spoken dialogue systems tend to have a strict concept of turns which makes the dialogue more similar to a ping-pong game than to humans conversing. Given that we aim at creating spoken dialogue systems that can engage in human-like conversation (note that altho...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کامل